December 11, 2018

Presentation Objectives

  • Provide example uses of R with stormwater data
  • Opine on strengths, weaknesses, perspectives

My Background

  • Staff in Municipal Stormwater Permitting Unit (i.e. the MS4 Unit)
  • Some rudimentary background in coding
  • Enjoy learning new things
  • Appreciate when complex information is communicated elegantly and/or succintly

Disclaimer

  • I am not an expert
  • What I am presenting may be inefficient
  • Tools, packages, and best practices evolve

Background of R

  • Programming Language and Environment for Statistical Computing and Graphics
  • Open-source
  • Large ecosystem of freely available packages

Analysis

Analysis

  • Renewing Phase I MS4 Permits -> "Regional MS4 Permit"
    • Evaluate monitoring data (~50 TMDLs) before permit consideration
    • Present our monitoring data analyses to the LA Regional Board in a series of 3 workshops (did this earlier this year)
    • Produce a monitoring data review report that goes over all the data (working on finishing right now)

Example 1: Bacteria Data Review

  • Several Bacteria TMDLs
    • Sites: ~ 100 receiving water monitoring sites
    • Water Type: Marine or Fresh
    • Beneficial Use: REC-1, LREC-1, REC-2
    • Frequency: Typically Weekly
    • Period: 10 Years
    • Parameters: 4 Indicator Bacteria
    • Limitations: Daily and Geo Mean

Example 1: Bacteria Data Review

\(Estimate \approx (52 \ weeks)(10 \ years)(4 \ parameters)(100 \ sites)\)
\(\approx \ 200,000 \ data \ points\)
\(\approx \ 200,000 \ geometric \ means\)
\(\approx \ 400,000 \ comparisons\)

Example 1: Bacteria Data Review

Excel VBA Tool:

Example 1: Bacteria Data Review

Excel VBA Tool

  • Friendly User Interface
  • Not Adaptable
  • Slow
  • Black Box
  • Difficult to QA
  • Difficult to Update

Example 1: Bacteria Data Review

library(bacteria)

### Load and Clean Data Code

### Analyze Data
results <- bact_check(data, stations, "REC-1", "marine", 
                      sub_ecoli_for_fecal = TRUE, six_week = FALSE)
exceeds <- bact_ann_exceeds(data, stations, "REC-1", "marine", 
                            sub_ecoli_for_fecal = TRUE, six_week = FALSE)

### Plot Results Code

### Export Results Code

Example 1: Bacteria Data Review

  • Need to know R to use
  • Adaptable
  • Fast (seconds vs. minutes)
  • Transparent -> Follow Analysis Step-by-Step
  • Easier to QA / Unit Test
  • Easy to Update

Example 2: Bacteria Heatmap

Example 3: Exceedance Map

Visualizations

Visualizations

  • ggplot2 package
  • Standardize formatting
  • Automate plotting for large datasets
  • Replot when data updated

Example 4: Boxplots

Facets

  • Plot subsets of data side-by-side for comparison
  • The right facet can highlight and identify key information about a dataset

Example 5: Facet by Constituent

Example 6: Facet by Weather

Example 7: No Facet

Example 8: Facet by Station

Example 9: Geofaceting

Interactive Plots

Example 10: RAA Model Output (Pre vs. Post EWMP)

Example 11: Time Series Graphs

library(dygraphs)
library(dplyr)
library(readxl)
library(xts)

f319 <- read_excel("data/F319.xlsx")

f319 <- f319 %>%
  dplyr::select(Date, Total) %>% 
  as.data.frame()

data <- xts(f319$Total, f319$Date)

dygraph(data, main = "Los Angeles River Flow Gage F319") %>%
  dyAxis("y", label = "Flow (AF/day)") %>%
  dyRangeSelector() 

Example 11: Time Series Graphs

Example 12: Mapping (IGP Example)

  1. Use SMARTS Storm Water Data File Download menu, download a tab-delimited file text file of "Storm Water Applications - General Information" for Region 4.
  2. Load the file into R and filter for active IGP sites.
  3. Use Leaflet package to plot sites on map and display specific information on mouseover.

Example 12: Mapping (IGP Example)

library(dplyr)
library(leaflet)
library(htmltools)

# Load File
filename <- "data/smarts_swapps_geninfo_2018-12-09.txt"
rb4 <- read.delim(filename, sep = "\t", stringsAsFactors = FALSE)

# Filter IGP Sites
rb4igp <- rb4 %>% 
  filter(PERMIT_TYPE == "Industrial") %>%
  filter(STATUS == "Active")

# Plot Map Using Leaflet (code for labels omitted due to space)
leaflet(rb4igp) %>% addTiles() %>%
  addCircles(lng = ~FACILITY_SITE_LONGITUDE, lat = ~FACILITY_SITE_LATITUDE,
             opacity = 0.75, label = lapply(labs, HTML))

Example 12: Mapping (IGP Example)

Example 13: Mapping Monitoring Data

R Shiny Web App

Link

Other Examples

Example 14: Scraping Tables from PDFs

  • File: PDF Report on Trash Assessments in San Mateo County, CA
  • Objective: Want to get data from tables within the document for data analysis, graphs, Excel, etc.

Example 14: Scraping Tables from PDFs

Example 14: Scraping Tables from PDFs

Example 14: Scraping Tables from PDFs

library(tabulizer)

report <- "pdf/San Mateo trash RTAs 2006-07.pdf"
lst <- extract_tables(report, encoding="UTF-8")

Example 14: Scraping Tables from PDFs

Example 14: Scraping Tables from PDFs

Example 15: Timelines

Example 16: Network Graphs (Permittees - TMDLs)

Example 17: Network Graphs (Permittees - WMPs/EWMPs)

Statistics

Data Science

  • R is a popular language for "data science"
  • Several packages for cleaning, wrangling, and "tidying" data
  • Several machine learning and deep learning packages

Example 18: Webpages

  • RMarkdown is a format that can be used to author webpages and other types of documents
  • Allows R code to be used when authoring webpages
  • See knitr, rmarkdown, bookdown, blogdown packages
  • Example

Example 19: Presentations

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

RStudio

  • Free and Open-Source "Integrated Development Environment (IDE)" for R
  • Highly, highly recommended if you want to use R!!!

What does R / RStudio look like?

What does R / RStudio look like?

Strengths

  • Repeatable
  • Transparent
  • Automation
  • Packages
  • (Much of these advantages apply to other programming languages like Python as well)

Weaknesses

  • Learning Curve
  • Sometimes I have nothing to show for the day
  • Sometimes I can lose sight of the big picture

Final Thoughts

  • R and its packages are just another set of tools (it does not generate insight)
  • ~80% of my time working with data is spent cleaning data
  • Can help as a tool to communicate information, build analyses, and handle large datasets
  • Gateway to learning about other technologies and topics (HTML/CSS/Javascript, Git and GitHub, Statistics, Data Science, Visualizations)
  • Can change the way you see data

End